44 research outputs found
A Machine Learning based Central Unit Detector for Basque Scientific Texts
En este artículo presentamos el primer detector de la Unidad Central (UC) de resúmenes científicos en euskera basado en técnicas de aprendizaje automático. Después de segmentar el texto en unidades de discurso elementales, la detección de la unidad central es crucial para anotar de forma más fiable la estructura relacional de textos bajo la Teoría de la Estructura Retórica o Rhetorical Structure Theory (RST). Además, la unidad central puede ser explotada en diversas tareas como resumen automático, tareas de pregunta y respuesta o análisis del sentimiento. Los resultados obtenidos demuestran que las técnicas de aprendizaje automático superan a las técnicas basadas en reglas a pesar del pequeño tamaño del corpus y de la heterogeneidad de los dominios que éste muestra, dejando todavía lugar para mejoras y desarrollo.This paper presents an automatic detector of the discourse central unit (CU) in scientific abstracts based on machine learning techniques. After segmenting a text in its elementary discourse units, the detection of the central unit is a crucial step on the way to robustly build discourse trees under the Rhetorical Structure Theory (RST). Besides, CU detection may also be useful in automatic summarization, question answering and sentiment analysis tasks. Results show that the CU detection using machine learning techniques for Basque scientific abstracts outperform rule based techniques, even on a small size corpus on different domains. This leads us to think that there is still room for improvement.Este trabajo ha sido financiado en parte por el siguiente proyecto: TIN2015-65308-C5-1-R (MINECO/FEDER)
EriBERTa: A Bilingual Pre-Trained Language Model for Clinical Natural Language Processing
The utilization of clinical reports for various secondary purposes, including
health research and treatment monitoring, is crucial for enhancing patient
care. Natural Language Processing (NLP) tools have emerged as valuable assets
for extracting and processing relevant information from these reports. However,
the availability of specialized language models for the clinical domain in
Spanish has been limited.
In this paper, we introduce EriBERTa, a bilingual domain-specific language
model pre-trained on extensive medical and clinical corpora. We demonstrate
that EriBERTa outperforms previous Spanish language models in the clinical
domain, showcasing its superior capabilities in understanding medical texts and
extracting meaningful information. Moreover, EriBERTa exhibits promising
transfer learning abilities, allowing for knowledge transfer from one language
to another. This aspect is particularly beneficial given the scarcity of
Spanish clinical data
Advances in monolingual and crosslingual automatic disability annotation in Spanish
Background
Unlike diseases, automatic recognition of disabilities has not received the same attention in the area of medical NLP. Progress in this direction is hampered by obstacles like the lack of annotated corpus. Neural architectures learn to translate sequences from spontaneous representations into their corresponding standard representations given a set of samples. The aim of this paper is to present the last advances in monolingual (Spanish) and crosslingual (from English to Spanish and vice versa) automatic disability annotation. The task consists of identifying disability mentions in medical texts written in Spanish within a collection of abstracts from journal papers related to the biomedical domain.
Results
In order to carry out the task, we have combined deep learning models that use different embedding granularities for sequence to sequence tagging with a simple acronym and abbreviation detection module to boost the coverage.
Conclusions
Our monolingual experiments demonstrate that a good combination of different word embedding representations provide better results than single representations, significantly outperforming the state of the art in disability annotation in Spanish. Additionally, we have experimented crosslingual transfer (zero-shot) for disability annotation between English and Spanish with interesting results that might help overcoming the data scarcity bottleneck, specially significant for the disabilities.This work was partially funded by the Spanish Ministry of Science and Innovation (MCI/AEI/FEDER, UE, DOTT-HEALTH/PAT-MED PID2019-106942RB-C31), the Basque Government (IXA IT1570-22), MCIN/AEI/ 10.13039/501100011033 and European Union NextGeneration EU/PRTR (DeepR3, TED2021-130295B-C31) and the EU ERA-Net CHIST-ERA and the Spanish Research Agency (ANTIDOTE PCI2020-120717-2)
Resumen de la tarea de ClinAIS en IberLEF 2023: Identificación Automática de Secciones en Documentos Clínicos en Castellano
The ClinAIS shared task organized by IOMED and the HiTZ center aims to tackle the identification of seven section types within unstructured clinical records in the Spanish language. These records, known as Electronic Clinical Narratives (ECNs), store crucial individual health information. However, their lack of standardized formats poses challenges in the development and evaluation of automated systems for clinical document analysis. Twenty-seven participants registered for the task, with five submitting results. This paper presents the outcomes and methodologies used in ClinAIS, contributing to the advancement of clinical text analysis and its application in improving healthcare decision-making and patient care.La tarea ClinAIS organizada por IOMED y el centro HiTZ tiene como objetivo abordar la identificación de siete tipos de secciones dentro de registros clínicos no-estructurados en español. Estos registros, conocidos como Narrativas Clínicas Electrónicas (ECNs), almacenan información crucial acerca de la salud personal. Sin embargo, la falta de estandarización en los formatos plantea desafíos en el desarrollo y evaluación de sistemas automatizados para el análisis de documentos clínicos. Veintisiete participantes se registraron para la tarea, de los cuales cinco presentaron resultados. Este artículo presenta los resultados y metodologías utilizadas en la tarea ClinAIS, contribuyendo al avance del análisis de notas clínicas y su aplicación en la mejora de la toma de decisiones en la atención médica y el cuidado al paciente.This work was partially funded by the Spanish Ministry of Science and Innovation (MCI/AEI/FEDER, UE, DOTTHEALTH/PAT-MED PID2019-106942RB-C31), the Basque Government (IXA IT1570-22), MCIN/AEI/ 10.13039/501100011033, European Union NextGeneration EU/PRTR (DeepR3 TED2021-130295B-C31, ANTIDOTE PCI2020-120717-2 EU ERA-Net CHIST-ERA), and the Government of the United States IARPA BETTER program (INT NOCORE 19/08 project, via Contract No. 2019-19051600006)
HiTZ@Antidote: Argumentation-driven Explainable Artificial Intelligence for Digital Medicine
Providing high quality explanations for AI predictions based on machine
learning is a challenging and complex task. To work well it requires, among
other factors: selecting a proper level of generality/specificity of the
explanation; considering assumptions about the familiarity of the explanation
beneficiary with the AI task under consideration; referring to specific
elements that have contributed to the decision; making use of additional
knowledge (e.g. expert evidence) which might not be part of the prediction
process; and providing evidence supporting negative hypothesis. Finally, the
system needs to formulate the explanation in a clearly interpretable, and
possibly convincing, way. Given these considerations, ANTIDOTE fosters an
integrated vision of explainable AI, where low-level characteristics of the
deep learning process are combined with higher level schemes proper of the
human argumentation capacity. ANTIDOTE will exploit cross-disciplinary
competences in deep learning and argumentation to support a broader and
innovative view of explainable AI, where the need for high-quality explanations
for clinical cases deliberation is critical. As a first result of the project,
we publish the Antidote CasiMedicos dataset to facilitate research on
explainable AI in general, and argumentation in the medical domain in
particular.Comment: To appear: In SEPLN 2023: 39th International Conference of the
Spanish Society for Natural Language Processin
Construcción de un corpus etiquetado sintácticamente para el euskera
El objetivo de este trabajo es la construcción de un corpus anotado sintácticamente
para el euskera. En esta comunicación presentaremos, en primer lugar, las bases sobre las que se
asienta nuestro etiquetado. Tras examinar diversas opciones se optó por el esquema presentado
por (Carrol et al., 1998). Este esquema sigue los estándares EAGLES y se basa en la idea de
añadir a cada frase del corpus una serie de relaciones gramaticales que especifican la
dependencia existente entre el núcleo y sus modificadores. Una vez presentado el formalismo de
etiquetado, se expondrán los problemas que hemos encontrado en nuestra tarea y las decisiones
tomadas. Seguidamente se describirá un ejemplo concreto en el que se muestra la aplicación de
dicho esquema sobre un corpus inicial. Finalmente, presentaremos las conclusiones sobre la
idoneidad del esquema al euskera y trabajo futuro.The aim of this work is the construction of a syntactically annotated treebank for
Basque. In this paper we present first, the basis of the annotation. After examining several
options we chose the scheme presented in (Carrol et al., 1998). It follows the EAGLES
standards and it is based on the idea of adding to each sentence in the corpus a series of
grammatical relations specifying the dependencies between modifiers and their nucleus. After
the formalism has been presented, we will describe the problems we have found and the
decisions we have taken to solve them. Next we present an example showing the application of
the scheme to an initial corpus. Finally, we present the main conclusions about the applicability
to Basque and future work.Este trabajo se ha realizado dentro del proyecto
"Construcción de una base de datos de árboles
sintácticos y semánticos", subvencionado por el
Ministerio de Educación y Ciencia (PROFIT:
FIT-150500-2002-244)
Relatório de estágio em farmácia comunitária
Relatório de estágio realizado no âmbito do Mestrado Integrado em Ciências Farmacêuticas, apresentado à Faculdade de Farmácia da Universidade de Coimbr
Telicity: a Lexical Feature or a Derived Concept?
This paper deals with the phenomenon of verbal aspectual classes and the way they are articulated in the lexicon. Over the last thirty years, various aspectual classifications have been proposed. However, there is still no consensus on the way verbs should be classified with respect to aspect. The difficulty arises because verbs can shift from one aspectual category to another depending on several factors such as the arguments the verb takes or the prepositional phrases that appear in the sentenc
Un detector de la unidad central de un texto basado en técnicas de aprendizaje automático en textos científicos para el euskera
En este artículo presentamos el primer detector de la Unidad Central (UC) de resúmenes científicos en euskera basado en técnicas de aprendizaje automático. Después de segmentar el texto en unidades de discurso elementales, la detección de la unidad central es crucial para anotar de forma más fiable la estructura relacional de textos bajo la Teoría de la Estructura Retórica o Rhetorical Structure Theory (RST). Además, la unidad central puede ser explotada en diversas tareas como resumen automático, tareas de pregunta y respuesta o análisis del sentimiento. Los resultados obtenidos demuestran que las técnicas de aprendizaje automático superan a las técnicas basadas en reglas a pesar del pequeño tamaño del corpus y de la heterogeneidad de los dominios que éste muestra, dejando todavía lugar para mejoras y desarrollo.Este trabajo a sido financiado en parte por el siguiente proyecto: TIN2015-65308-C5-1-R (MINECO/FEDER)
Un detector de la unidad central de un texto basado en técnicas de aprendizaje automático en textos científicos para el euskera
En este artículo presentamos el primer detector de la Unidad Central
(UC) de resúmenes cient´ıficos en euskera basado en técnicas de aprendizaje
automático. Después de segmentar el texto en unidades de discurso elementales, la
detección de la unidad central es crucial para anotar de forma más fiable la estructura
relacional de textos bajo la Teoría de la Estructura Retórica o Rhetorical
Structure Theory (RST). Además, la unidad central puede ser explotada en diversas
tareas como resumen automático, tareas de pregunta y respuesta o análisis del
sentimiento. Los resultados obtenidos demuestran que las técnicas de aprendizaje
automático superan a las técnicas basadas en reglas a pesar del pequeño tamaño del
corpus y de la heterogeneidad de los dominios que éste muestra, dejando todavía
lugar para mejoras y desarrollo.This paper presents an automatic detector of the discourse central unit
(CU) in scientific abstracts based on machine learning techniques. After segmenting
a text in its elementary discourse units, the detection of the central unit is a crucial
step on the way to robustly build discourse trees under the Rhetorical Structure
Theory (RST). Besides, CU detection may also be useful in automatic summarization,
question answering and sentiment analysis tasks. Results show that the CU
detection using machine learning techniques for Basque scientific abstracts outperform
rule based techniques, even on a small size corpus on different domains. This
leads us to think that there is still room for improvement.Este trabajo a sido financiado en parte por el siguiente proyecto: TIN2015-65308-C5-1-R (MINECO/FEDER)